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Art Unit: 2172 

DETAILED ACTION 

1 . This Office Action is response to Applicants' communications filed on 
01/11/2002. 

2. Claims 1-13 are pending in this application. 



Claim Rejections - 35 USC § 103 



3. The following is a quotation of 35 U.S.C, 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. This application currently names joint inventors. In considering patentability of 
the claims under 35 U.S.C. 103(a), the examiner presumes that the subject matter of 
the various claims was commonly owned at the time any inventions covered therein 
were made absent any evidence to the contrary. Applicant is advised of the obligation 
under 37 CFR 1 .56 to point out the inventor and invention dates of each claim that was 
not commonly owned at the time a later invention was made in order for the examiner to 
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consider the applicability of 35 U.S.C. 103(c) and potential 35 U.S.C. 102(e), (f) or (g) 
prior art under 35 U.S.C. 103(a). 

5. Claims 1-13 are rejected under 35 U.S.C. 103(a) as being unpatentable over US 
Patent No. 6,507,840 issued to loannidis et al. (hereinafter loannidis) in view of Patent 
No. 6,636,862 issued to Lundahl (hereinafter Lundahl). 

With respect to claim 1 , loannidis teaches a) determining a foreground frequency 
of a bucket within a first cluster (using histogram technique to determine the bucket and 
its frequencies for the data distribution sets: col. 8, lines 62-67 and col. 9, lines 1-30). 

b) determining a background frequency of the bucket with respect to all of the 
clusters (ranges of attribute values into buckets: col. col. 10, lines 1-48); 

c) comparing the foreground and background frequencies (comparing the data 
distribution sets: col. 6, lines 52-67). 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly teach d) determining a quality index based on the comparison. 

However, Lundahl teaches the index for a given data clustering (col. 13, lines 
43-67 and col. 14, lines 1-8). 
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Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to compare the value of the index of a given data clustering 
and proving the best index value is selected (col. col. 15, lines 1-25; also see fig. 3). 
The motivation being to have a database for storing the data in which provides the 
clustering or partitioning advantage for a wider range of queries and used the bucket 
histogram technique to compare the distance of data distribution sets and the results 
from the clustering algorithms in a parallel processing system environment. 

With respect to claim 2, loannidis teaches wherein said comparing step further 
comprises subtracting the relative foreground and background frequencies (during 
computing the distance of data distribution of bucket sets: col. 7, lines 15-67). 

With respect to claim 3, loannidis discloses a method as discussed in claim 1 . 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly teach d) determining a quality index based on the comparison, loannidis does 
not explicitly teach squaring the result of the comparison. 

However, Lundahl teaches the sums of squares matrices for each cluster (col. 
27, lines 18-67). 
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Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to have sum of squares of the result of the cluster to be 
compared and compare the multi-dimensional qualities data structure having multiple 
partitions and clusters that can be constructed for the same data. The motivation being 
to have a database for storing the data in which provides the clustering or partitioning 
advantage for a wider range of queries and used the bucket histogram technique to 
compare the distance of data distribution sets and the results from the clustering 
algorithms in. a parallel processing system environment. 

With respect to claim 4, loannidis discloses a method as discussed in claim 1 . 
And loannidis teaches updating up-to-date the database for. processing operation (col. 
3, lines 1-18). 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly teach e) determining an optimal number of clusters; and f) comparing the 
optimal number of clusters to the actual number of clusters. 

However, Lundahl teaches the optimal of the number of clusters (see fig. 5, col. 
9, lines 1-20 and col. 13, lines 55-67 and col. 14, lines 1-8). 
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Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to have sum of squares of the result of the cluster to be 
compared and compare the multi-dimensional qualities data structure having multiple 
partitions and clusters that can be constructed for the same data. The motivation being 
to have a database for storing the data in which provides the clustering or partitioning 
advantage for a wider range of queries and used the bucket histogram technique to 
compare the distance of data distribution sets and the results from the clustering 
algorithms in a parallel processing system environment. 

With respect to claims 5-6, loannidis discloses a method as discussed in claim 1 . 
And loannidis teaches buckets (col. 9, lines 12-67). 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly teach wherein the optimal number of clusters is determined by a maximum 
number of buckets for a variable, and wherein the optimal number of clusters is set to a 
threshold value in case the maximum number of buckets is greater than the threshold 
value.- 
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However, Lundahl teaches the optimal of the number of clusters (see fig. 5, col. 
9, lines 1-20 and col. 13, lines 55-67 and col. 14, lines 1-8) and the value of threshold 
(col. 13, lines 50-67, col. 21, lines 38-52 and col. 23, lines 36-61). 

Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis in view of Martin 
with the teachings of Lundahl so as to have sum of squares of the result of the cluster to 
be compared and compare the multi-dimensional qualities data structure having multiple 
partitions and clusters that can be constructed for the same data (col. 4, lines 6-38, also 
see abstract and col. 3, lines 4-15). The motivation being to have a database for storing 
the data in which provides the clustering or partitioning advantage for a wider range of 
queries and used the bucket histogram technique to compare the distance of data 
distribution sets and the results from the clustering algorithms in a parallel processing 
system environment. 

With respect to claims 7-9, loannidis discloses a method as discussed in claim 1. 
And loannidis teaches the relative foreground and background frequencies (during 
computing the distance of data distribution of bucket sets: col. 7, lines 15-67). 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
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explicitly teach wherein the optimal number of clusters, a normalizing value and 
normalizing the result of the comparison, summing the results of the corresponding 
comparison values. 

However, Lundahl teaches the optimal of the number of clusters (see fig. 5, col. 
9, lines 1-20 and col. 13, lines 55-67 and col. 14, lines 1-8), multiplying and summing 
the result (the product of the matrix: col. 5, lines 24-51), and normalizing the values (col. 
12, lines 27-67). 

Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to have sum of squares of the result of the cluster to be 
compared and compare the multi-dimensional qualities data structure having multiple 
partitions and clusters that can be constructed for the same data (col. 4, lines 6-38, also 
see abstract and col. 3, lines 4-15). The motivation being to have a database for storing 
the data in which provides the clustering or partitioning advantage for a wider range of 
queries and used the bucket histogram technique to compare the distance of data 
distribution sets and the results from the clustering algorithms in a parallel processing 
system environment. 

With respect to claim 10, loannidis teaches performing a number of data 
clustering operation (a number of operation to be applied on histogram technique for 
data distribution sets: col. 12, lines 4-67; also col. 4, lines 1-21). 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
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25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly determining a quality index for each result of the data clustering operations; 
and c) selecting the result with the highest quality index as an end result of the data 
clustering. 

However, Lundahl teaches the index and the best index value to be selected 
(col. 13, lines 8-67 and col. 14, lines 1-8) and the highest value to be chosen (col. 13, 
lines 30-42). 

Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to have the best index value to be selected for the 
comparison and sum of squares of the result of the cluster to be compared and 
compare the multi-dimensional qualities data structure having multiple partitions and 
clusters that can be constructed for the same data (col. 4, lines 6-38, also see abstract 
and col. 3, lines 4-15). The motivation being to have a database for storing the data in 
which provides the clustering or partitioning advantage for a wider range of queries and 
used the bucket histogram technique to compare the distance of data distribution sets 
and the results from the clustering algorithms in a parallel processing system 
environment. 
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With respect to claim 1 1 , loannidis teaches selecting an initial set of clusters 
(selecting a initial element in the bucket: col. 13, lines 1-58). 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly teach determining a quality index for the clusters; and performing a number of 
iterations to improve the quality index. 

x However, Lundahl teaches the number of iterations for the index to be chosen 
(col. 15, lines 1-26) and index and the best index value to be selected (col. 13, lines 8- 
67 and col. 14, lines 1-8). 

Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to have the best index value to be selected for the 
comparison and sum of squares of the result of the cluster to be compared and 
compare the multi-dimensional qualities data structure having multiple partitions and 
clusters that can be constructed for the same data (col. 4, lines 6-38, also see abstract 
and col. 3, lines 4-15). The motivation being to have a database for storing the data in 
which provides the clustering or partitioning advantage for a wider range of queries and 
used the bucket histogram technique to compare the distance of data distribution sets 
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and the results from the clustering algorithms in a parallel processing system 
environment. 

With respect to claim 12, loannidis teaches a method as discussed in claim 1 1 . 

loannidis teaches using bucket histogram technique for data clustering, the 
distance between of two multisets, in parallel processing database systems (col. 5, lines 
• 25-50 and col. 6, lines 32-50), each bucket is assuming that the values that fall within 
the range of a bucket (col. 5, 52-67; also see col. 2, lines 16-65) and the frequencies 
representing sets of two data distributions (distance measurement based on various 
distribution moments) and comparing the data distribution sets, loannidis does not 
explicitly teach determining the quality index for the modified clusters, and using the 
modified clusters as a new initial set of clusters in case the quality index improved. 

However, Lundahl teaches the index and the best index value to be selected (col. 
13, lines 8-67 and col. 14, lines 1-8). 

Therefore, it would have been obvious to a person of ordinary skill in the art at 
the time the invention was made to combine the teachings of loannidis with the 
teachings of Lundahl so as to have the best index value to be selected for the 
comparison and sum of squares of the result of the cluster to be compared and 
compare the multi-dimensional qualities data structure having multiple partitions and 
clusters that can be constructed for the same data (col. 4, lines 6-38, also see abstract 
and col. 3, lines 4-15). The motivation being to have a database for storing the data in 
which provides the clustering or partitioning advantage for a wider range of queries and 
used the bucket histogram technique to compare the distance of data distribution sets 
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and the results from the clustering algorithms in a parallel processing system 
environment. 

Claim 13 is essentially the same as claim 1 except that it is directed to a 
computer program product rather than a method (), and is rejected for the same reason 
as applied to the claim 1 hereinabove. 
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6. Any inquiry concerning this communication or earlier communications from the 



E-Mail: ANH.LY@USPTO.GOV . The examiner can normally be reached on 7:30 AM - 
4:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, John Breene, can be reached on 703 305-9790. The fax phone number for 
the organization where this application or proceeding is assigned is 703 746-7239. 

Any response to this action should be mailed to: 

Commissioner of Patents and Trademarks 

Washington, D.C. 20231 

or faxed to: Central Office (703) 872-9306 (Central Official Fax Number) 
Hand-delivered responses should be brought to Crystal Park II, 2121 Crystal 
Drive, Arlington, VA, Fourth Floor (receptionist). 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 703 308- 
6606 or 703 305-3900. C~)tflllkr~? ^ 



examiner should be directed to Anh Ly whose telephone number is 703 306-4527 or via 
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